Introduction

Column

Abstract

This project will analyze how county-level conditions impact up economic mobility for children in low-income families, specifically families who fall in the twenty-fifth percentile in income in the United States. The data used in this project is from the 1990 birth cohort. The main goal of the analysis is to identify the county-level characteristics associated with higher economic mobility, and to investigate if growing up in a poorer county inherently means lower upward mobility. The analysis will also seek to identify the county-level characteristics that are best at offsetting the negative mobility effects of poverty. The primary tool for these analyses is a multiple linear regression model, using predictive variables of two datasets from the Opportunity Insights organization to predict the response variable: Mean percentile rank in the national distribution of household income at age 27 for children whose parents are at the 25th percentile of national income, pooled across races and genders. Results show that while higher poverty share strongly predicts lower mobility, counties with higher rates of college-educated adults and employment can significantly offset these negative effects.

Column

Research Questions

  • Which county-level characteristics are associated with higher income mobility for children from low-income families?
  • Does growing up in a poorer county always mean lower upward mobility, or do some county characteristics offset the effects of poverty?

Source

The data I used was collected from a nonprofit organization called Opportunity Insights. They are a research organization based at Harvard University aiming to expand economic opportunity in the United States by identifying barriers to upward mobility and developing solutions to empower people to rise out of poverty. The sample I’m using from them has 3,115 observations from 3,115 of 3,244 total counties in the United States.

Column

Background/Significance

Understanding the drivers of economic mobility is one of the biggest topics of economic thought. The motivation behind this project was to learn about these drivers and how they can impact the class of Americans who need upward mobility the most. Economic mobility is a reflection of how well a society can promote an equal playing field for all, regardless of where someone starts on the income ladder growing up. Identifying which county-level characteristics are associated with undesirable future outcomes can mitigate the disadvantages associated with growing up poor.

Column

Data Description and EDA

Column

Variables Used

The variables used in this analysis include the following:

Response variable:

  • kfr_pooled_pooled_p25: Mean percentile rank, relative to other children born the same year, in the national distribution of household income at age 27, for children whose parents are at the 25th percentile of national income, pooled across races and genders.

Explanatory variables:

  • emp_pooled1990: Fraction of children (across all races/genders) from the 1990 birth cohort who are employed at age 27.

  • hhinc_median_pooled1990: Median household income (in 2023 dollars) for the pooled population (all races/genders) in 1990.

  • poor_share_pooled1990: Share of individuals below the federal poverty line (pooled, 1990).

  • frac_coll_pooled1990: Fraction of people aged 25+ with a college degree (bachelor’s or higher), pooled across races/genders, 1990.

  • singlepar_pooled1990: Share of households with children under 18 that have a single parent (either female head/no husband or male head/no wife), pooled across races/genders, 1990.

  • share_black1990: Fraction of population identified as Black in the 1990 Census.

  • foreign_share1990: Fraction of residents who are foreign-born in 1990.

  • gini1990: Measures income inequality for the county in 1990.

  • pop_pooled1990: Total county population in 1990.

Missing Data

There was a relatively small amount of missing data in the sample. The response variable (kfr_pooled_pooled_p25) had 61 missing values and one of the predictors (share_black1990) had 104 missing values. I chose to remove these rows with missing values due to the large size of the sample. This way, the analysis is solely based on real, observed data.

Summaries

Summary Statistic Table

Variable Mean SD Min Max
kfr_pooled_pooled_p25 4.594614e-01 5.811120e-02 2.026000e-01 9.169000e-01
emp_pooled1990 emp_pooled1990 6.799673e-01 7.741390e-02 3.070022e-01 8.629962e-01
hhinc_median_pooled1990 hhinc_median_pooled1990 5.900564e+04 1.627552e+04 2.112592e+04 1.457160e+05
poor_share_pooled1990 poor_share_pooled1990 1.665286e-01 7.901360e-02 2.180170e-02 5.997913e-01
frac_coll_pooled1990 frac_coll_pooled1990 1.352562e-01 6.592910e-02 3.689340e-02 5.341625e-01
singlepar_pooled1990 singlepar_pooled1990 2.033841e-01 6.656320e-02 4.802260e-02 6.015037e-01
share_black1990 share_black1990 8.940260e-02 1.452094e-01 7.910000e-05 8.623599e-01
foreign_share1990 foreign_share1990 7.179417e-01 1.520259e-01 1.346519e-01 9.723646e-01
gini1990 gini1990 4.240839e-01 3.797070e-02 2.712100e-01 5.924208e-01
pop_pooled1990 pop_pooled1990 7.989196e+04 2.648273e+05 6.750000e+02 8.863164e+06

Heatmap of all U.S. counties

Methods used

Regression Model

The first regression model in my analysis was used to estimate a child’s mean percentile rank, relative to other children born the same year at age 27, for children whose parents are at the 25th percentile of national income.

Each coefficient reflects the expected difference in the outcome (income rank) for a unit change in that characteristic, controlling for others. Positive coefficients suggest that increasing that trait improves mobility; negative coefficients imply the opposite.

The model is given by:

kfr_pooled_pooled_p25 = 0.5079 + 0.1875 * emp_pooled1990 - 1.044e-06 * hhinc_median_pooled1990 + 0.03723 * poor_share_pooled1990 + 0.1123 * frac_coll_pooled1990 - 0.4581 * singlepar_pooled1990 - 0.05009 * share_black1990 + 0.003764 * foreign_share1990 - 0.09823 * gini1990 + 7.532e-09 * pop_pooled1990

Diagnostic plots

Research question 1

Which county-level characteristics are associated with higher income mobility for children from low-income families?

According to the regression model, counties with higher employment rates, a greater fraction of college graduates, and more residents living in larger counties tend to have higher income mobility for children from low-income families. Counties with a higher share of single-parent households, greater Black population share, higher income inequality, and higher median household income tend to have lower upward mobility. These variables were the most statistically significant in the model, and therefore are reliable predictors of economic mobility in impoverished counties.

Correlation Heatmap

Regression Coefficient Table
term estimate std.error statistic p.value
(Intercept) 0.508 0.020 25.242 0.000
emp_pooled1990 0.187 0.015 12.329 0.000
hhinc_median_pooled1990 0.000 0.000 -11.390 0.000
poor_share_pooled1990 0.037 0.022 1.688 0.091
frac_coll_pooled1990 0.112 0.017 6.704 0.000
singlepar_pooled1990 -0.458 0.018 -25.735 0.000
share_black1990 -0.050 0.008 -6.605 0.000
foreign_share1990 0.004 0.006 0.679 0.497
gini1990 -0.098 0.032 -3.047 0.002
pop_pooled1990 0.000 0.000 2.588 0.010

Research question 2

Does growing up in a poorer county always mean lower upward mobility, or do some county characteristics offset the effects of poverty?

To answer this question, I estimated regression models both with and without college-educated share and employment rate. I used AIC to compare model fit. The AIC spiked when these variables were removed from the model. This result shows that growing up in a poorer county does not always mean lower upward mobility. There is statistical evidence that factors like education and employment can significantly reduce the negative impact of poverty.

AIC Comparison of Regression Models
Model Description AIC
Model 1 All predictors -10901.95
Model 2 No college, no employment -10658.01

As shown in the table, the AIC is over 90 units lower in the full model, meaning adding education and employment greatly improves explanatory power. According to the bar cahrt below, the share of college-educated adults and employment rates, compared to poverty rate, appear to play a more powerful role in supporting future economic success.

Conclusion

Discussion

In doing this project, I learned a lot about what variables can help predict upward mobility for low-income children in America. While poverty remains a barrier, this analysis shows its negative effect can be substantially offset in counties with a wide range of job opportunities and higher levels of education. However, several limitations should be noted. The model relies on observational, cross-sectional data, limiting out ability to make strong claims about causality. There are also unmeasured factors, in this dataset, such as school quality or neighborhood effects. Additionally, even the small amount of missing data could potentially influence the accuracy of estimates. Overall, these findings provide valuable insight into what kinds of actions policymakers and communities can take to promote upward mobility to the areas in America that need it the most.

About the Author

My name is Scott Robbins, I am currently pursuing a Bachelor of Arts in Economics with a minor in data analytics.

References

Opportunity Insights. (2024). Codebook for Table 3: County-Level Outcomes by Birth Cohort, Parental Income, Race, and Gender. https://opportunityinsights.org/wp-content/uploads/2024/07/ChangingOpportunity_Codebook_Table_3_County_by_Cohort_Estimates.pdf

Opportunity Insights. (2024). Codebook for Table 8: County-level Covariates. https://opportunityinsights.org/wp-content/uploads/2024/07/ChangingOpportunity_Codebook_Table_8_County_Covariates.pdf

---
title: "Drivers of Economic Mobility"
output: 
  flexdashboard::flex_dashboard:
    theme: simplex

    orientation: columns
    vertical_layout: fill
    source_code: embed
---

```{r setup, include=FALSE}
library(flexdashboard)
library(broom)
library(knitr)
library(corrplot)
library(tidyverse)
library(dplyr)
library(MASS)
library(ggplot2)
library(maps)
library(gridExtra)
df1 <- read_csv("~/Downloads/county_by_cohort_estimates.csv")
df2 <- read_csv("~/Downloads/Table_8_county_covariates.csv")
outcomes_small <- df1 %>%
  
  filter(cohort == 1990) %>%
  dplyr::select(state, county, state_name, county_name,
    kfr_pooled_pooled_p25)

covars_small <- df2 %>%
  dplyr::select(
    state, county,
    emp_pooled1990,
    hhinc_median_pooled1990,
    poor_share_pooled1990,
    frac_coll_pooled1990,
    singlepar_pooled1990,
    share_black1990,
    foreign_share1990,
    gini1990,
    pop_pooled1990
  )

df <- outcomes_small %>%
  inner_join(covars_small, by = c("state", "county"))

colSums(is.na(df))


df_complete <- df[complete.cases(df), ]

colSums(is.na(df_complete))
model1 <- lm(kfr_pooled_pooled_p25 ~ emp_pooled1990 + hhinc_median_pooled1990
                       + poor_share_pooled1990+frac_coll_pooled1990+singlepar_pooled1990
                       +share_black1990+foreign_share1990+gini1990+pop_pooled1990, data = df_complete)
model2 <- lm(kfr_pooled_pooled_p25 ~ hhinc_median_pooled1990
            + poor_share_pooled1990+singlepar_pooled1990
            +share_black1990+foreign_share1990+gini1990+pop_pooled1990, data = df_complete) 

quant_vars <- c("emp_pooled1990", "hhinc_median_pooled1990", "poor_share_pooled1990", 
                "frac_coll_pooled1990", "singlepar_pooled1990", "share_black1990", 
                "foreign_share1990", "gini1990", "pop_pooled1990")

stepwise_aic <- stepAIC(model1, direction = "both", trace = TRUE)
stepwise_aic2 <- stepAIC(model2, direction = "both", trace = TRUE)


summary_table <- df %>%
  dplyr::select(all_of(c("kfr_pooled_pooled_p25", quant_vars))) %>%
  summarise(across(everything(),
                   list(Mean = ~mean(., na.rm=TRUE),
                        SD = ~sd(., na.rm=TRUE),
                        Min = ~min(., na.rm=TRUE),
                        Max = ~max(., na.rm=TRUE)), 
                   .names = "{.col}_{.fn}"))
summary_long <- data.frame(
  Variable = c("kfr_pooled_pooled_p25", quant_vars),
  Mean = c(mean(df$kfr_pooled_pooled_p25, na.rm=T), 
           sapply(df[quant_vars], function(x) mean(x, na.rm=T))),
  SD = c(sd(df$kfr_pooled_pooled_p25, na.rm=T),
         sapply(df[quant_vars], function(x) sd(x, na.rm=T))),
  Min = c(min(df$kfr_pooled_pooled_p25, na.rm=T),
          sapply(df[quant_vars], function(x) min(x, na.rm=T))),
  Max = c(max(df$kfr_pooled_pooled_p25, na.rm=T),
          sapply(df[quant_vars], function(x) max(x, na.rm=T)))
)
cor_matrix <- cor(df[, c("kfr_pooled_pooled_p25", quant_vars)], use = "complete.obs")
county_map <- map_data("county")
county_map <- county_map %>%
  mutate(
    state = tolower(region),
    county = tolower(subregion)
  )

df_map <- df %>%
  mutate(
    state = tolower(state_name),   
    county = tolower(county_name)
  )

plot_data <- inner_join(county_map, df_map, by = c("state", "county"))
coef_table <- tidy(model1)



aic1 <- AIC(model1)
aic2 <- AIC(model2)


aic_table <- data.frame(
  Model = c("Model 1", "Model 2"),
  Description = c("All predictors", "No college, no employment"),
  AIC = c(aic1, aic2)
)

main_effects <- data.frame(
  Variable = c("Poverty Rate", "College-Educated Share (age 25+)", "Emp. Rate at age 27"),
  Coefficient = c(
    coef(model1)[["poor_share_pooled1990"]],
    coef(model1)[["frac_coll_pooled1990"]],
    coef(model1)[["emp_pooled1990"]]
  )
)
```

Introduction
===
Column {data-width=1300}
---
### Abstract
This project will analyze how county-level conditions impact up economic mobility for children in low-income families, specifically families who fall in the twenty-fifth percentile in income in the United States. The data used in this project is from the 1990 birth cohort. The main goal of the analysis is to identify the county-level characteristics associated with higher economic mobility, and to investigate if growing up in a poorer county inherently means lower upward mobility. The analysis will also seek to identify the county-level characteristics that are best at offsetting the negative mobility effects of poverty. The primary tool for these analyses is a multiple linear regression model, using predictive variables of two datasets from the Opportunity Insights organization to predict the response variable: Mean percentile rank in the national distribution of household income at age 27 for children whose parents are at the 25th percentile of national income, pooled across races and genders. Results show that while higher poverty share strongly predicts lower mobility, counties with higher rates of college-educated adults and employment can significantly offset these negative effects.


Column {data-width=1000}
-----------------------------------------------------------------------

### Research Questions
* Which county-level characteristics are associated with higher income mobility for children from low-income families?
* Does growing up in a poorer county always mean lower upward mobility, or do some county characteristics offset the effects of poverty?

### Source
The data I used was collected from a nonprofit organization called Opportunity Insights. They are a research organization based at Harvard University aiming to expand economic opportunity in the United States by identifying barriers to upward mobility and developing solutions to empower people to rise out of poverty. The sample I'm using from them has 3,115 observations from 3,115 of 3,244 total counties in the United States.


Column {data-width=1100}
---

### Background/Significance

Understanding the drivers of economic mobility is one of the biggest topics of economic thought. The motivation behind this project was to learn about these drivers and how they can impact the class of Americans who need upward mobility the most. Economic mobility is a reflection of how well a society can promote an equal playing field for all, regardless of where someone starts on the income ladder growing up. Identifying which county-level characteristics are associated with undesirable future outcomes can mitigate the disadvantages associated with growing up poor.










```{r}

```

Column {.tabset data-width=550}
-----------------------------------------------------------------------

Data Description and EDA
===

Column{.tabset}
---
### Variables Used


The variables used in this analysis include the following:

#### Response variable: 
- kfr_pooled_pooled_p25: Mean percentile rank, relative to other children born the same year, in the national distribution of household income at age 27, for children whose parents are at the 25th percentile of national income, pooled across races and genders.

#### Explanatory variables:
- emp_pooled1990: Fraction of children (across all races/genders) from the 1990 birth cohort who are employed at age 27.

- hhinc_median_pooled1990: Median household income (in 2023 dollars) for the pooled population (all races/genders) in 1990.

- poor_share_pooled1990: Share of individuals below the federal poverty line (pooled, 1990).

- frac_coll_pooled1990: Fraction of people aged 25+ with a college degree (bachelor's or higher), pooled across races/genders, 1990.

- singlepar_pooled1990: Share of households with children under 18 that have a single parent (either female head/no husband or male head/no wife), pooled across races/genders, 1990.

- share_black1990: Fraction of population identified as Black in the 1990 Census.

- foreign_share1990: Fraction of residents who are foreign-born in 1990.

- gini1990: Measures income inequality for the county in 1990.

- pop_pooled1990: Total county population in 1990.



#### Missing Data

There was a relatively small amount of missing data in the sample. The response variable (kfr_pooled_pooled_p25) had 61 missing values and one of the predictors (share_black1990)
had 104 missing values. I chose to remove these rows with missing values due to the large size of the sample. This way, the analysis is solely based on real, observed data.




### Summaries

#### Summary Statistic Table
```{r}
kable(summary_long, format = "simple")
```







#### Heatmap of all U.S. counties
```{r}
ggplot(plot_data, aes(long, lat, group = group, fill = kfr_pooled_pooled_p25)) +
  geom_polygon(color = "white", size = 0.1) +
  coord_fixed(1.3) +
  scale_fill_viridis_c(option = "plasma") +
  theme_void() +
  labs(title = "Income Mobility by County")

```

Methods used
===

### Regression Model

The first regression model in my analysis was used to estimate a child's mean percentile rank, relative to other children born the same year at age 27, for children whose parents are at the 25th percentile of national income. 

Each coefficient reflects the expected difference in the outcome (income rank) for a unit change in that characteristic, controlling for others. Positive coefficients suggest that increasing that trait improves mobility; negative coefficients imply the opposite.

The model is given by:

kfr_pooled_pooled_p25 = 0.5079 +
  0.1875 * emp_pooled1990 -
  1.044e-06 * hhinc_median_pooled1990 +
  0.03723 * poor_share_pooled1990 +
  0.1123 * frac_coll_pooled1990 -
  0.4581 * singlepar_pooled1990 -
  0.05009 * share_black1990 +
  0.003764 * foreign_share1990 -
  0.09823 * gini1990 +
  7.532e-09 * pop_pooled1990


### Diagnostic plots

```{r, fig.width=20, fig.height=5}
par(mfrow = c(1, 4))
par(mar = c(4, 4, 2, 1)) 
plot(model1)
par(mfrow = c(1, 1))

```


Research question 1
===

#### Which county-level characteristics are associated with higher income mobility for children from low-income families?

According to the regression model, counties with higher employment rates, a greater fraction of college graduates, and more residents living in larger counties tend to have higher income mobility for children from low-income families. Counties with a higher share of single-parent households, greater Black population share, higher income inequality, and higher median household income tend to have lower upward mobility. These variables were the most statistically significant in the model, and therefore are reliable predictors of economic mobility in impoverished counties.


#### Correlation Heatmap
```{r}
corrplot(cor_matrix, method = "color")

```

```{r}
kable(coef_table, digits = 3, caption = "Regression Coefficient Table")
```






Research question 2
===

#### Does growing up in a poorer county always mean lower upward mobility, or do some county characteristics offset the effects of poverty?

To answer this question, I estimated regression models both with and without  college-educated share and employment rate. I used AIC to compare model fit. The AIC spiked when these variables were removed from the model. This result shows that growing up in a poorer county does not always mean lower upward mobility. There is statistical evidence that factors like education and employment can significantly reduce the negative impact of poverty.

```{r}
kable(aic_table, caption = "AIC Comparison of Regression Models")
```

As shown in the table, the AIC is over 90 units lower in the full model, meaning adding education and employment greatly improves explanatory power. According to the bar cahrt below, the share of college-educated adults and employment rates, compared to poverty rate, appear to play a more powerful role in supporting future economic success.


```{r, fig.width= 10, fig.height=5}
ggplot(main_effects, aes(x = Variable, y = Coefficient, fill = Variable)) +
  geom_col(width = 0.7) +
  labs(title = "Main County-Level Effects on Upward Mobility",
       y = "Estimated Coefficient",
       x = "") +
  theme_minimal() +
  scale_fill_brewer(palette = "Set2") +
  geom_text(aes(label = round(Coefficient, 3)), vjust = -0.5)
```



Conclusion
===


### Discussion

In doing this project, I learned a lot about what variables can help predict upward mobility for low-income children in America. While poverty remains a barrier, this analysis shows its negative effect can be substantially offset in counties with a wide range of job opportunities and higher levels of education. However, several limitations should be noted. The model relies on observational, cross-sectional data, limiting out ability to make strong claims about causality. There are also unmeasured factors, in this dataset, such as school quality or neighborhood effects. Additionally, even the small amount of missing data could potentially influence the accuracy of estimates. Overall, these findings provide valuable insight into what kinds of actions policymakers and communities can take to promote upward mobility to the areas in America that need it the most.

### About the Author

My name is Scott Robbins, I am currently pursuing a Bachelor of Arts in Economics with a minor in data analytics. 


### References

Opportunity Insights. (2024). Codebook for Table 3: County-Level Outcomes by Birth Cohort, Parental Income, Race, and Gender. https://opportunityinsights.org/wp-content/uploads/2024/07/ChangingOpportunity_Codebook_Table_3_County_by_Cohort_Estimates.pdf

Opportunity Insights. (2024). Codebook for Table 8: County-level Covariates. https://opportunityinsights.org/wp-content/uploads/2024/07/ChangingOpportunity_Codebook_Table_8_County_Covariates.pdf